Theano

¿Qué criterio se utiliza para las matrices en Machine Learning?

Las filas corresponden con ejemplos.
Las columnas corresponden con la dimensionalidad de los datos

De esta forma, una entrada [10,5] corresponde con 10 ejemplos con dimensionalidad 5.

La función asarray convierte una serie de datos en un arrar de numpy. Pertenece a la librería numpy.



In [ ]:

    
import numpy
numpy.asarray([[1., 2], [3, 4], [5, 6]])



In [ ]:

    
numpy.asarray([[1., 2], [3, 4], [5, 6]]).shape



In [ ]:

    
numpy.asarray([[1., 2], [3, 4], [5, 6]])[2, 0]



In [ ]:

    
a = numpy.asarray([1.0, 2.0, 3.0])
b = 2.0
a * b

Álgebra

Theano es un compilador de expresiones simbólicas. ¿Qué significa esto? Que expresamos primero las ecuaciones, y Theano se encarga de ejecutarlas con los datos que más tarde le proporcionamos.

Es lo más parecido a escribir una ecuación tal y como nos vienen en los artículos.

Un primer ejemplo que vamos a ver es cómo definir una función. Hay tres pasos fundamentales:

Declarar el tipo de datos que la función va a usar.
Damos forma a la función.
Utilizamos la herramiento theano.function para que Theano la nueva función expresada en el paso 2.

Veamos un ejemplo:



In [ ]:

    
import theano.tensor as T
from theano import function

# Aquí importamos theano y utilizamos dos características fundamentales en Theano: Tensor y function

Los diferentes tipos de tensores que Theano incorpora se pueden ver en la siguiente tabla:

Constructor	dtype	ndim	shape	broadcastable
bscalar	int8	0	()	()
bvector	int8	1	(?,)	(False,)
brow	int8	2	(1,?)	(True, False)
bcol	int8	2	(?,1)	(False, True)
bmatrix	int8	2	(?,?)	(False, False)
btensor3	int8	3	(?,?,?)	(False, False, False)
btensor4	int8	4	(?,?,?,?)	(False, False, False, False)
wscalar	int16	0	()	()
wvector	int16	1	(?,)	(False,)
wrow	int16	2	(1,?)	(True, False)
wcol	int16	2	(?,1)	(False, True)
wmatrix	int16	2	(?,?)	(False, False)
wtensor3	int16	3	(?,?,?)	(False, False, False)
wtensor4	int16	4	(?,?,?,?)	(False, False, False, False)
iscalar	int32	0	()	()
ivector	int32	1	(?,)	(False,)
irow	int32	2	(1,?)	(True, False)
icol	int32	2	(?,1)	(False, True)
imatrix	int32	2	(?,?)	(False, False)
itensor3	int32	3	(?,?,?)	(False, False, False)
itensor4	int32	4	(?,?,?,?)	(False, False, False, False)
lscalar	int64	0	()	()
lvector	int64	1	(?,)	(False,)
lrow	int64	2	(1,?)	(True, False)
lcol	int64	2	(?,1)	(False, True)
lmatrix	int64	2	(?,?)	(False, False)
ltensor3	int64	3	(?,?,?)	(False, False, False)
ltensor4	int64	4	(?,?,?,?)	(False, False, False, False)
dscalar	float64	0	()	()
dvector	float64	1	(?,)	(False,)
drow	float64	2	(1,?)	(True, False)
dcol	float64	2	(?,1)	(False, True)
dmatrix	float64	2	(?,?)	(False, False)
dtensor3	float64	3	(?,?,?)	(False, False, False)
dtensor4	float64	4	(?,?,?,?)	(False, False, False, False)
fscalar	float32	0	()	()
fvector	float32	1	(?,)	(False,)
frow	float32	2	(1,?)	(True, False)
fcol	float32	2	(?,1)	(False, True)
fmatrix	float32	2	(?,?)	(False, False)
ftensor3	float32	3	(?,?,?)	(False, False, False)
ftensor4	float32	4	(?,?,?,?)	(False, False, False, False)
cscalar	complex64	0	()	()
cvector	complex64	1	(?,)	(False,)
crow	complex64	2	(1,?)	(True, False)
ccol	complex64	2	(?,1)	(False, True)
cmatrix	complex64	2	(?,?)	(False, False)
ctensor3	complex64	3	(?,?,?)	(False, False, False)
ctensor4	complex64	4	(?,?,?,?)	(False, False, False, False)
zscalar	complex128	0	()	()
zvector	complex128	1	(?,)	(False,)
zrow	complex128	2	(1,?)	(True, False)
zcol	complex128	2	(?,1)	(False, True)
zmatrix	complex128	2	(?,?)	(False, False)
ztensor3	complex128	3	(?,?,?)	(False, False, False)
ztensor4	complex128	4	(?,?,?,?)	(False, False, False, False)

En nuestro primer ejemplo vamos a utilizar dos variables del tipo dscalar, que corresponde a números reales de 64 bits.

Ojo: Actualmente, a la hora de llevar datos a la GPU es mejor configurarlos con 32 bits.



In [ ]:

    
x = T.dscalar('x')
y = T.dscalar('y')

# Este es el primer paso. Definimos las variables a usar en la expresión de la función que queremos implementar.



In [ ]:

    
z = x + y

# Esta es la expresión simbólica de la función.
# Fijaros que todavía no hay datos.

Por último, lo que nos queda por hacer es que Theano compile la expresión simbólica.

Para ello, Theano usa function. Vamos a indagar que nos proporciona esta herramienta de Theano:

    function.function(inputs, outputs, mode=None, updates=None, givens=None, no_default_updates=False, accept_inplace=False, name=None, rebuild_strict=True, allow_input_downcast=None, profile=None, on_unused_input='raise')

Como se aprecia, hay un montón de parámetros que se le pueden incluir a la functión function. Nos quedamos de momento con tres: inputs, ouputs y name.

Inputs: Lista de parámetros que necesita la función que se va a ejecutar
Outputs: Función a ejecutar
Inputs: Campo que no es obligatorio, pero podemos asignar un nombre a nuestra función



In [ ]:

    
f = function([x, y], z, name='suma')
f(2,3)

Pero obviamente, esto funciona con cualquier otro tipo de dato:



In [ ]:

    
x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y
f = function([x, y], z)
f([[1, 2], [3, 4]], [[10, 20], [30, 40]])

Incluso podemos calcular más de una función a la vez:



In [ ]:

    
a, b = T.dmatrices('a', 'b')
diff = a - b
abs_diff = abs(diff)
diff_squared = diff**2
f = function([a, b], [diff, abs_diff, diff_squared])
f([[1, 1], [1, 1]], [[0, 1], [2, 3]])

Y definir valores por defecto importando el paquete Param the Theano:



In [ ]:

    
from theano import Param
x, y = T.dscalars('x', 'y')
z = x + y
f = function([x, Param(y, default=1)], z)
f(33)



In [ ]:

    
x, y, w = T.dscalars('x', 'y', 'w')
z = (x + y) * w
f = function([x, Param(y, default=1), Param(w, default=2, name='w_by_name')], z)
f(33)

Variables Shared

This code introduces a few new concepts. The shared function constructs so-called shared variables. These are hybrid symbolic and non-symbolic variables whose value may be shared between multiple functions. Shared variables can be used in symbolic expressions just like the objects returned by dmatrices(...) but they also have an internal value that defines the value taken by this symbolic variable in all the functions that use it. It is called a shared variable because its value is shared between many functions. The value can be accessed and modified by the .get_value() and .set_value() methods.

The other new thing in this code is the updates parameter of function. updates must be supplied with a list of pairs of the form (shared-variable, new expression). It can also be a dictionary whose keys are shared-variables and values are the new expressions. Either way, it means “whenever this function runs, it will replace the .value of each shared variable with the result of the corresponding expression”. Above, our accumulator replaces the state‘s value with the sum of the state and the increment amount.

Nota: Para asegurarnos que los datos son cargados en la GPU hay que definirlos en variables shared.

En el siguiente ejemplo introducimos el parámetro updates, en el que trás la ejecución de la función la variable a actualizar lo hace con la expresión que se adjunta.

updates=[(variable, expresión de actualización)]

Se debe observar en el ejemplo que la función es simplemente devolver el valor de "state", el cual es actualizado por la ecuación definido en el parámetro "updates".



In [ ]:

    
from theano import shared
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)])



In [ ]:

    
state.get_value()
accumulator(1)
state.get_value()

Logistic regression

Función sigmoidea

\begin{align} s(x) &= \dfrac{1}{1+e^{-x}} \end{align}



In [ ]:

    
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = function([x], s)
logistic([[0, 1], [-1, -2]])

Si queremos representar esta función en Python:



In [ ]:

    
import matplotlib.pyplot as plt

x = T.dscalar('x')
s = 1 / (1 + T.exp(-x))
logistic = function([x], s)

nums=range(-60,60)
x=[]
y=[]
for i in nums:
    x.append(i/10.)
    y.append(logistic(i/10.))

plt.plot(x,y)
plt.show()

Función tanh



In [ ]:

    
s2 = (1 + T.tanh(x / 2)) / 2
logistic2 = function([x], s2)
logistic2([[0, 1], [-1, -2]])



In [ ]:

    
import matplotlib.pyplot as plt

x = T.dscalar('x')
s2 = (1 + T.tanh(x / 2)) / 2
logistic2 = function([x], s2)


nums=range(-60,60)
x=[]
y=[]
for i in nums:
    x.append(i/10.)
    y.append(logistic2(i/10.))

plt.plot(x,y)
plt.show()

Algoritmo

Vamos a definir el algoritmo de aprendizaje "Logistic Regression"



In [ ]:

    
import numpy
import theano
import theano.tensor as T
rng = numpy.random

steps=300000
feats=2
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name='w')
b = theano.shared(0., name='b')
print "Modelo inicial:"
print "W (tamaño): " + repr(w.get_value().shape)
print "b (valor): " + repr(b.get_value())

import scipy.io as io
        
print '... cargando datos'
data=io.loadmat('dataLR.mat',squeeze_me=True)
dataIn=data['data'][:,0:2].astype(theano.config.floatX)
dataOut = data['data'][:,2].astype(theano.config.floatX)

'''N = 400
feats = 2
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
#dataIn=D[0]
#dataOut=D[1]
'''

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01*(w ** 2).mean()/2. # The cost to minimize
gw = T.grad(cost, w)             # Compute the gradient of the cost
gb = T.grad(cost, b)                                           # (we shall return to this in a
                                          # following section of this tutorial
        
# Compile
train = theano.function(
          inputs=[x,y],
          outputs=prediction,
          updates=[(w, w - 0.1 * gw),(b, b - 0.1 * gb)])
predict = theano.function(inputs=[x], outputs=prediction)

# Train
for i in range(steps):
    pred = train(dataIn, dataOut)
    
print "Valores esperados: ", dataOut
pp= predict(dataIn)
print "Valores previstos: ", pp

print "Tasa de acierto: ", map(lambda x,y:x==y, dataOut, predict(dataIn)).count(True)



In [ ]:

    
import matplotlib.pyplot as plt
import numpy as np

x=np.zeros((100,2))
y=np.zeros((100,2))
for i in range(100):
    if (pp[i]==1):
        x[i,:]=dataIn[i,:]
    else:
        y[i,:]=dataIn[i,:]

plt.plot(x[:,0],x[:,1],'ro', y[:,0],y[:,1],'go')
plt.axis([25,100,25,100])
plt.show()



In [ ]: